Normalized Compression Distance Based Measures for MetricsMATR 2010

نویسندگان

  • Marcus Dobrinkat
  • Tero Tapiovaara
  • Jaakko Väyrynen
  • Kimmo Kettunen
چکیده

We present the MT-NCD and MT-mNCD machine translation evaluation metrics as submission to the machine translation evaluation shared task (MetricsMATR 2010). The metrics are based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and evaluated against human judgments from the WMT08 shared task. The experiments show that 1) our metric improves correlation to human judgments by using flexible matching, 2) segment replication is effective, and 3) our NCD-inspired method for multiple references indicates improved results. Generally, the proposed MT-NCD and MT-mNCD methods correlate competitively with human judgments compared to commonly used machine translations evaluation metrics, for instance, BLEU.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Normalized Information Distance is Not Semicomputable

Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. This practical application is called ‘normalized compression distance’ and it is trivially computable. It is a parameter-free similarity measure based on comp...

متن کامل

New Similarity Measures of Fuzzy Soft Sets Based on Distance Measures

Similarity measure is a very important problem in fuzzy soft set theory. In this paper, seven similarity measures of fuzzy soft sets are introduced, which are based on the normalized Hamming distance, the normalized Euclidean distance, the generalized normalized distance, the Type-2 generalized normalized distance, the Type-2 normalized Euclidean distance, the Hausdorff distance and the Chebysh...

متن کامل

Chapter 3 Normalized Information Distance

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wid...

متن کامل

Normalized Compression Distance as automatic MT evaluation metric

This paper evaluates a new automatic MT evaluation metric, Normalized Compression Distance (NCD), which is a general tool for measuring similarities between binary strings. We provide system-level correlations and sentence-level consistencies to human judgements and comparison to other automatic measures with the WMT’08 dataset. The results show that the general NCD metric is at the same level ...

متن کامل

Perceptual Normalized Information Distance for Image Distortion Analysis Based on Kolmogorov Complexity

Image distortion analysis is a fundamental issue in many image processing problems, including compression, restoration, recognition, classification, and retrieval. In this work, we investigate the problem of image distortion measurement based on the theories of Kolmogorov complexity and normalized information distance (NID), which have rarely been studied in the context of image processing. Bas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010